19 research outputs found
GPS-ABC: Gaussian Process Surrogate Approximate Bayesian Computation
Scientists often express their understanding of the world through a
computationally demanding simulation program. Analyzing the posterior
distribution of the parameters given observations (the inverse problem) can be
extremely challenging. The Approximate Bayesian Computation (ABC) framework is
the standard statistical tool to handle these likelihood free problems, but
they require a very large number of simulations. In this work we develop two
new ABC sampling algorithms that significantly reduce the number of simulations
necessary for posterior inference. Both algorithms use confidence estimates for
the accept probability in the Metropolis Hastings step to adaptively choose the
number of necessary simulations. Our GPS-ABC algorithm stores the information
obtained from every simulation in a Gaussian process which acts as a surrogate
function for the simulated statistics. Experiments on a challenging realistic
biological problem illustrate the potential of these algorithms
Optimization Monte Carlo: Efficient and Embarrassingly Parallel Likelihood-Free Inference
We describe an embarrassingly parallel, anytime Monte Carlo method for
likelihood-free models. The algorithm starts with the view that the
stochasticity of the pseudo-samples generated by the simulator can be
controlled externally by a vector of random numbers u, in such a way that the
outcome, knowing u, is deterministic. For each instantiation of u we run an
optimization procedure to minimize the distance between summary statistics of
the simulator and the data. After reweighing these samples using the prior and
the Jacobian (accounting for the change of volume in transforming from the
space of summary statistics to the space of parameters) we show that this
weighted ensemble represents a Monte Carlo estimate of the posterior
distribution. The procedure can be run embarrassingly parallel (each node
handling one sample) and anytime (by allocating resources to the worst
performing sample). The procedure is validated on six experiments.Comment: NIPS 2015 camera read
Hamiltonian ABC
Approximate Bayesian computation (ABC) is a powerful and elegant framework
for performing inference in simulation-based models. However, due to the
difficulty in scaling likelihood estimates, ABC remains useful for relatively
low-dimensional problems. We introduce Hamiltonian ABC (HABC), a set of
likelihood-free algorithms that apply recent advances in scaling Bayesian
learning using Hamiltonian Monte Carlo (HMC) and stochastic gradients. We find
that a small number forward simulations can effectively approximate the ABC
gradient, allowing Hamiltonian dynamics to efficiently traverse parameter
spaces. We also describe a new simple yet general approach of incorporating
random seeds into the state of the Markov chain, further reducing the random
walk behavior of HABC. We demonstrate HABC on several typical ABC problems, and
show that HABC samples comparably to regular Bayesian inference using true
gradients on a high-dimensional problem from machine learning.Comment: Submission to UAI 201
Nonparametric Bayesian methods for extracting structure from data
One desirable property of machine learning algorithms is the ability to balance the number of parameters in a model in accordance with the amount of available data. Incorporating nonparametric Bayesian priors into models is one approach of automatically adjusting model capacity to the amount of available data: with small datasets, models are less complex (require storing fewer parameters in memory), whereas with larger datasets, models are implicitly more complex (require storing more parameters in memory). Thus, nonparametric Bayesian priors satisfy frequentist intuitions about model complexity within a fully Bayesian framework. This thesis presents several novel machine learning models and applications that use nonparametric Bayesian priors. We introduce two novel models that use flat, Dirichlet process priors. The first is an infinite mixture of experts model, which builds a fully generative, joint density model of the input and output space. The second is a Bayesian biclustering model, which simultaneously organizes a data matrix into block-constant biclusters. The model capable of efficiently processing very large, sparse matrices, enabling cluster analysis on incomplete data matrices. We introduce binary matrix factorization, a novel matrix factorization model that, in contrast to classic factorization methods, such as singular value decomposition, decomposes a matrix usin
Nonparametric Bayesian Biclustering
Copyright c â—‹ Edward Meeds 2007. We present a probabilistic block-constant biclustering model that simultaneously clusters rows and columns of a data matrix. All entries with the same row cluster and column cluster form a bicluster. Each cluster is part of a mixture having a nonparametric Bayesian prior. The number of biclusters is therefore treated as a nuisance parameter and is implicitly integrated over during simulation. Missing entries are completely integrated out of the model, allowing us to completely bipass the common requirement for biclustering algorithms that missing values be filled before analysis, but also makes it robust to high rates of missing values. By using a Gaussian model for the density of entries in bliclusters, an efficient sampling algorithm is produced because bicluster parameters are analytically integrated out. We present several inference procedures for sampling cluster indicators, including Gibbs and split-merge moves. We show that our method is competitive, if not superior, to existing imputation methods, especially for high missing rates, despite imputing constant values for entire blocks of data. We present imputation experiments and exploratory biclustering results
An alternative infinite mixture of gaussian process experts
We present an infinite mixture model in which each component comprises a multivariate Gaussian distribution over an input space, and a Gaussian Process model over an output space. Our model is neatly able to deal with non-stationary covariance functions, discontinuities, multimodality and overlapping output signals. The work is similar to that by Rasmussen and Ghahramani [1]; however, we use a full generative model over input and output space rather than just a conditional model. This allows us to deal with incomplete data, to perform inference over inverse functional mappings as well as for regression, and also leads to a more powerful and consistent Bayesian specification of the effective ‘gating network ’ for the different experts.